Methods of Statistical Data Compression

نویسنده

  • Steven de Rooij
چکیده

Data compression is important not only for conserving resources; it also has applications in cryptography and it can be used as an estimator for redundancy in the data: this has many applications, such as prediction, classification and other difficult problems in machine learning. We study algorithms that perform lossless statistical data compression. Statistical data compression is attractive because it allows for separation of the problems of modelling and coding, both of which will be treated here. It seems safe to say that with the development of arithmetic coding in 1976, the problem of coding has been solved satisfactorily, while the problem of modelling remains very difficult to this day. We will restrict ourselves to online modelling. In chapter 2 we study the theoretical background of statistical data compression, relating results of information theory and probability theory to coding and modelling. Then we focus on more concrete issues: in chapter 3 we treat an adaptation of Ukkonen’s algorithm for the online construction of suffix trees, which we use to maintain the particular information about the input stream that we need for the construction of the kind of models that we study. We will find that, for our purposes, a problem with online suffix trees is that they do not explicitly represent the end of the input stream in the tree structure. An investigation of the subtle properties of online suffix trees will help us to formulate a solution to this problem. Finally, in chapter 4 we describe PPM and point out its inherent suboptimality. We then develop two alternative source models which are motivated by entirely different arguments. A comparison will show that these alternatives provide promising performance, warranting further research.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fano-Huffman Based Statistical Coding Method

Statistical coding techniques have been used for lossless statistical data compression, applying methods such as Ordinary, Shannon, Fano, Enhanced Fano, Huffman and Shannon-Fano-Elias coding methods. A new and improved coding method is presented, the Fano-Huffman Based Statistical Coding Method. It holds the advantages of both the Fano and Huffman coding methods. It is more easily applicable th...

متن کامل

بررسی اثر نوروپروتکتیوی عصاره الکلی برگ گیاه شاهدانه(کانابیس ساتیوا) بر دژنراسیون آلفا موتونورون‌های نخاع پس از آسیب عصب سیاتیک در رت

Introduction: Injuries of the peripheral nerve system affect the neurons cell body leading to axon injury. Cannabis sativa plant has anti oxidant and anti apoptotic effects. Therefore the aim of present study was to study the neuroprotective effect of alcoholic extract of cannabis sativa leaves on neuronal density of alpha motoneurons in spinal cord after sciatic nerve injury in rats. Methods:...

متن کامل

Fuzzy Clustering and Hyperanalytic Wavelet Transform for Lossy Image Compression: A Review

Clustering techniques are mostly unsupervised methods that can be used to organize data into groups based on similarities among the individual data items. Most clustering algorithms do not rely on assumptions common to conventional statistical methods, such as the underlying statistical distribution of data, and therefore they are useful in situations where little prior knowledge exists. The po...

متن کامل

MEDICAL IMAGE COMPRESSION: A REVIEW

Within recent years the use of medical images for diagnosis purposes has become necessity. The limitation in transmission and storage space also growing size of medical images has necessitated the need for efficient method, then image Compression is required as an efficient way to reduces irrelevant and redundancy of the image data in order to be able to store or transmits data. It also reduces...

متن کامل

Statistical mechanics of lossy data compression using a nonmonotonic perceptron.

The performance of a lossy data compression scheme for uniformly biased Boolean messages is investigated via methods of statistical mechanics. Inspired by a formal similarity to the storage capacity problem in neural network research, we utilize a perceptron of which the transfer function is appropriately designed in order to compress and decode the messages. Employing the replica method, we an...

متن کامل

Text Compression and Language Modeling

Data compression is bound up with grammatical inference and prediction. If we can infer a grammar for some data, we may use that grammar to compress that data and to predict what data is likely to occur in the future. In this paper I describe how the popular text compression techniques actually work. I then compare the results achieved by the ad-hoc dictionary methods with the modern statistica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003